



Fig. 1



FIG. 2



Fig. 3



FIG. 4



FIG. 5A



FIG. 5B





FIG. 6A



FIG. 6B



FIG. 7A



FIG. 7B



**FIG. 8**



**FIG. 9**



FIG. 10



FIG. 11A



FIG. 11B



FIG. 12





FIG. 14



**FIG. 15**



FIG. 16A



FIG. 16B



FIG. 16C



FIG. 16D



Fig. 17



FIG. 18A



FIG. 18B



FIG. 18C

00000000000000000000000000000000



FIG. 19



FIG. 20

© 1995 by The McGraw-Hill Companies, Inc.



FIG. 21



Fig. 22A



FIG. 22B



FIG. 23A



FIG. 23B



FIG. 23C

| ADDRESS | FUNCTION    |
|---------|-------------|
| 0x710   | FFT RADIX 2 |
| 0x905   | FIR FILTER  |
| 0x2100  | CONVOLUTION |

FIG. 24A

DECODED MICROCODE



FIG. 24B



FIG. 25



FIG. 26



FIG. 27



FIG. 28



FIG. 29



FIG. 30A



FIG. 30B



FIG. 31

95124532 100626



FIG. 32



FIG. 33



**FIG. 34**



FIG. 35



FIG. 36

# Stellar Technologies, Ltd.



The developer of **RAMDSP™**  
*Integrating Ram & Digital Signal Processing*

**RAMDSP™**

Provides cost effective high performance & low cost  
by tightly coupling 2Mbits - 64Mbits DRAM on  
the same chips as the DSP engine.

# Stellar Technologies, Ltd.

The developer of **RAMDSP™**  
*Integrating Ram & Digital Signal Processing*

- Background
- Business Model
- Comparisons
- Architecture
- Issues and Next Steps



# Stellar Background

The developer of **RAMDSP™**  
*Integrating Ram & Digital Signal Processing*

Founded by Richard Rubinstein, President and CEO in 1994.

Mr. Rubinstein has extensive background & experience in VLSI technology, DSP, team building and product line P&L's.

- Intel
- Cypress Semiconductor
- Data General
- Sharp Micro Electronics

# Stellar Background

cont..

The developer of **RAMDSP™**  
*Integrating Ram & Digital/Signal Processing*

- Santolina Associates of San Jose, CA invested in Stellar Technologies, LTD. in December, 1996.
- Santolina Associates has a member on Stellar Technologies, Board of Directors.

# Stellar Background

## Patents

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*

### • Patent Applications Filed

Stellar's Patent Counsel

Marger, Johnson, McCollum & Stollowitz, P.C. Portland, Oregon  
Mr. Stollowitz is Stellar's Patent Attorney



# Stellar Business Model

## DRAM Product Focus

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*

- Fast ATM switch including signal processing and computation
- Web-TV: ADSL, JPEG, and Web Browser
- Game Machines, PDA devices, DVD Applications
- Video Systems Applications: frame buffer, MPEG, and graphics
- Telecommunications
- High-Performance Ethernet Physical Layer

(All of the above require high-performance and substantial amounts of DRAM)

# Stellar Business Model

## Potential Micron / Stellar Business

### Relationships

The developer of **RAMDSP™**  
*Integrating Ram & Digital Signal Processing*

- Micron's technical and market focus brings embedded technology to market with 2 - 4x performance of TI's TMS320C60 at reduced power and system costs
- Micron manufactures and sells products defined and designed by Stellar and transferred to product manufacturing by Micron
- Stellar interests are to obtain a royalty stream from **RAMDSP™** products sold by Micron

# Stellar Business Model

## Potential Micron / Stellar Business

### Relationships

The developer of **RAMDSP™**

*Integrating Ram & Digital Signal Processing*

Stellar provides **RAMDSP™** silicon design and software to support high-performance, low-power, low-cost DRAM applications

- **RAMDSP™** may function as DSP accelerator working in conjunction with a separate front-end controller, or
- **RAMDSP™** implements stand-alone applications that don't require a front-end controller.
- For either of the above, Stellar
  - designs with Micron's tools,
  - sells a license to Micron for **RAMDSP™** technology, and
  - obtains a royalty stream for designs and products sold.
- Stellar will train Micron engineers on **RAMDSP™** software tools and support custom DSP library development.

# Stellar Business Model

## Milestones and Costs

The developer of RAMDSP™

*Integrating Ram & Digital Signal Processing*

- Q4/97 Functional Specification for MPEG Application
- Q1/98 Functional Simulator and Assembler
  - Architecture Specification for MPEG Application Complete
- Q2/98 Simulator and Behavioral RTL verified, \$650,000.00
- Q1, Q2/98 MPEG Application and Market Development \$450,000.00
- Q4/98 Software Tools plus MPEG port complete \$1.6 million
- Q1/99 Prototype Silicon \$3.5 million

(above funds prepay part of IP license and royalties)

# Stellar Architecture

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*



- BIU  
Buss Interface Unit
- MCCU  
Memory Centric Control Unit
- DRAM-EUI  
DRAM Execution Unit Interface
- EU  
Execution Unit

# Stellar Architecture

## Development Process

The developer of RAMDSP™

*Integrating Ram & Digital Signal Processing*



# Stellar Cost/Performance

## Power

The developer of RAMDSP™  
Integrating Ram & Digital Signal Processing

## Stellar's RAMDSP™ Architecture

has been compared to TI's device:

TMS320C6X - 200 MHz • 1,600 MIPS

- Power Dissipation Approximately 1/2\*\*

\*\*At comparable performance to TI. RAMDSP™  
will require approximately 1/2 the power

# Stellar Cost/Performance

## Processing Speed

### RAMDSP™

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*

## Stellar's Architecture RAMDSP™

Has been compared to TI's device:

TMS320C6X - 200 MHz • 1,600 MIPS

- Performance Approximately 1- 4X\*\*\*

\*\*\* RAMDSP™ product family will support performance  
range of 1 to 4 times the performance of  
TMS320C6X on a single chip

# Stellar Cost/Performance

## System Cost

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*

### Stellar's RAMDSP™ Architecture

has been compared to TI's device:

TMS320C6X - 200MHz • 1,600 MIPS

- System Cost Approximately\*

1/8 x TI with 4 - 8Mb DRAM  
1/4 x TI with 8 - 16 Mb DRAM  
1/3 x TI with 16 - 64 Mb DRAM

\* Based upon DRAM Integration

# Stellar Architecture

## Data Ram & Local Ram

The developer of RAMDSP™

*Integrating Ram & Digital Signal Processing*



# Stellar Architecture

## Data RAM & Local RAM

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*

for  
Video Interface



# Stellar Architecture

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*

- Low Speed - DMEM nominal cycle time = 40 ns = major cycle
- High Speed - EU and SMEM nominal cycle time = 5 ns
- Intermediate speed - SEQ and DMEM MAC needs to be four times faster than DMEM.
  - Therefore, nominal cycle time = 10 ns
- The ratio for the number of cycles per major cycle is:
  - 1:4:8 for 40 ns DRAM
  - 1:4:4 for 20 ns DRAM

# Stellar Architecture

## Memory Centric Controller Unit cont.

The developer of RAMDSP™

*Integrating Ram & Digital Signal Processing*

RAMDSP technology applies to both DRAM and SRAM data memory

- Memory Addressing for Execution Unit
- Operation Control Fields
- Execution Unit timing
- Memory interface unit timing and control
- Control for reconfiguring and sharing memory

# Stellar Architecture

## Memory Centric Controller Unit cont.

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*

- Power Management Control
- Addressing for Multiple Memory Blocks, Multiple Execution Units, and Memory V/O
- Provision for Fast Local Storage
- Full Support for Conditional Execution
- Provides Complete Built-in Self Test Capabilities
  - reduces manufacturing costs
  - implements control information for repair

# Stellar Architecture

## MCCU Interface Controls

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*

- Status Block Information Control from Engine to Core processor
- Interrupt Handling Processor
- Engine To Engine Communications
  - Multi-processor support
  - Status
  - Interrupts
  - Stand-alone processing
- Support for Memory Mapped Instructions from Core processor to Engine



# Stellar Architecture

# MCCU Block Diagram

# *Integrating Ram & Digital Signal Processing*

## The developer of **RAMDSP™**



# Stellar Architecture

# MCCU Sequence Controller

The developer of **RAMDSP™**

# *Integrating Ram & Digital Signal Processing*





# Stellar Architecture

## MicroCode Store Segmentation

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*



# Stellar Architecture

## MicroCode Store

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*



### End of Operations Bit

- The EOO Bit is set by the assembler
- When the EOO Bit is encountered, that MicroCode block is idle until the start of the next major cycle
- Note that the units are idle during a NOP

### Example of EOO Bit for DMEM AG

| Assembler Code | Machine Code | EOO Bit |
|----------------|--------------|---------|
| inc 3, setaddr | 1 2 3 0      | 1       |
| nop            | 0 0 0 0      | 0       |
| nop            | 0 0 0 0      | 0       |
| nop            | 0 0 0 0      | 0       |

# Stellar Architecture

## Data Ram Partitioning

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*

- Data Ram addressed in 8 assignable segments
- Execution Unit and Bus Interface Unit swap data segments transparently
- Fixed segments preserve algorithm state
- MCCU controls reconfiguration and coordinates processing with I/O

# Stellar Architecture

## Data Ram Partitioning

The developer of RAMDSP™

*Integrating Ram & Digital Signal Processing*



1. Current execution with segment 1 while last results are unloaded from segment 2.
2. Execution continues with segment 1 while next data is loaded into segment 2.
3. Segments are reassigned - execute with segment 2 while previous results are unloaded from segment 1.

# Stellar Architecture

## Bus Interface Unit

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*

- Provides a high bandwidth programmable interface to standard and custom busses
- Shares control with the MCCU
- Direct path to DRAM I/O Blocks



# Stellar Architecture



**Bus Interface Unit**  
The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*

- Maintains high bandwidth communication between BIU and DRAM block
- Support for various standards
  - Synchronous DRAM
  - EDO
  - Rambus
  - PCI
- Simple interface can substitute for BIU, e.g., standard memory buss interface

# Stellar Architecture



## Execution Unit

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*

- Extendable Architecture
- Two  $\leftrightarrow$  Eight Way SIMD
- Local Memory Support for Execution Units
- Branch and Interrupt Status Feedback To MCCU

# Stellar Architecture

## Execution Unit Interface

The developer of RAMDSP™

*Integrating Ram & Digital Signal Processing*

- Provides Interface to correctly match Execution Unit DRAM bandwidth
- Includes Capability for merging data from multiple DRAM memory blocks into one Execution Unit

# Stellar Architecture

## Parallel SIMD Style Execution Unit

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*



# Stellar Architecture

## Matched Data Rates

The developer of RAMDSP™

*Integrating Ram & Digital Signal Processing*



# Stellar Architecture

## Parallel SIMD Execution Unit

The developer of **RAMDSP™**

*Integrating Ram & Digital Signal Processing*

ADD16

16-bits

16-bits

16-bits

16-bits



# Stellar Architecture

## Parallel SIMD Execution Unit

The developer of RAMDSP™

*Integrating Ram & Digital Signal Processing*

MULHI16

16-bits 16-bits 16-bits 16-bits

|    |    |    |    |    |    |    |    |
|----|----|----|----|----|----|----|----|
| a0 | b0 | a1 | b1 | a2 | b2 | a3 | b3 |
|----|----|----|----|----|----|----|----|

32-bits 32-bits

|         |         |         |         |
|---------|---------|---------|---------|
| a0 * b0 | a1 * b1 | a2 * b2 | a3 * b3 |
|---------|---------|---------|---------|

# Stellar Architecture

## Parallel SIMD Execution Unit

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*



MULL016

16-bits      16-bits      16-bits      16-bits

|    |  |    |  |    |  |    |  |
|----|--|----|--|----|--|----|--|
| a0 |  | a1 |  | a2 |  | a3 |  |
|----|--|----|--|----|--|----|--|

|    |  |    |  |    |  |    |  |
|----|--|----|--|----|--|----|--|
| b0 |  | b1 |  | b2 |  | b3 |  |
|----|--|----|--|----|--|----|--|

32-bits      32-bits

|         |  |  |         |  |
|---------|--|--|---------|--|
| a1 * b1 |  |  | a3 * b3 |  |
|---------|--|--|---------|--|

# Stellar Architecture

## SIMD uCode Instructions

The developer of RAMDSP™

*Integrating Ram & Digital Signal Processing*

- Multiplier can do some Adder operations
- Parallel 8, 16, 32-bit ADD, COMP, SHIFT  
(signed, unsigned, or saturating)
- Parallel 8, 16-bit MUL (to double width)  
(odd or even; signed or unsigned)
- 64-bit AND, OR, XOR
- Parallel PACK and UNPACK (odd or even fields)
- MOVE and MERGE (under mask)
- LDZero, LDOne (for normalization and compression)

# Stellar Architecture

The developer of **RAMDSP™**  
*Integrating Ram & Digital Signal Processing*

## Special Instructions for MPEG

- Parallel 8, 16, 32-bit Absolute Value of Difference

(signed, unsigned, or saturating)

$$c = |a-b|$$

- Parallel 8, 16, 32-bit Add, Round and Shift

(signed, unsigned, or saturating)

$$c \equiv (a+b+2**n)>>(n+1)$$

example: n=0

$$c = (a+b+1)>>1$$

example: n=7, b=0

$$c = (a+2**7)>>8$$

# Stellar Architecture

## Vector Programming Model

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*

- 8 element, 64-bit vector registers match DRAM speed to Execution Unit speed
- Sequencer-controller generates vector slices of micro-code, loops and branches
- Conditional execution and vector merge support data-dependent threads

# Stellar Architecture

## EU Opcode Generator

The developer of RAMDSP™

*Integrating Ram & Digital/Signal Processing*



# Stellar Architecture

## EU Opcode Generator

The developer of RAMDSP™

*Integrating Ram & Digital Signal Processing*



# Stellar Architecture

## Peak Execution Rates

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*



- 200 MHz Execution Unit / 25 MHz DRAM (nominal)
  - 4 x ( 8-bit Mult + 16-bit Add) =  $8 \times 200 \text{ MHz} = 1600 \text{ Mops}$  or
  - 2 x (16-bit Mult + 32-bit Add) =  $4 \times 200 \text{ MHz} = 800 \text{ Mops}$  or
  - 8 x (16-bit Add) =  $8 \times 200 \text{ MHz} = 1600 \text{ Mops}$  or
  - 16 x ( 8-bit Add)
- Memory Bandwidth
  - 8 Bytes  $\times 200 \text{ MHz} = 1600 \text{ MByte/s}$   
(per 64-bit data path)
- Depending on DRAM technology, up to 4 64-bit data paths and execution units may be accommodated

# Stellar Architecture

## Data Path Sizes

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*

- Memory R/W 64bits x 8 = 512-bits per DRAM cycle  
or = 256 bits for 50 MHz DRAM

- Registers 64-bit x 8 x 16 = 8K bits (4K → 8K range)

(above depends on DRAM speed and target application)

- DRAM size = 2 - 64 M bits can address 512 Mbits
- SRAM size = 0 - 64 K bits

(function of Register File size and application)

# Stellar Architecture

## RAMDSP vs. TI C6x Characteristics

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*

- Simpler data paths reduce power.
- 8 port register file for **RAMDSP™** vs. 15 ports for TI
- ~~Better~~ uCode density due to SIMD instructions and vector register set organization (32 - 64 bits vs. 256 bits for TI)
- ~~Better~~ power control circuits (due to lower clock rate)
- All application data 'on chip' eliminates data transfer bottleneck, while memory reconfiguration eliminates on-chip data movement

# Stellar Architecture

## RAMDSP™ vs. TIC6X Characteristics

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*

- Flexible SIMD style execution allows additional data parallelism and 2 - 4x execution speed

- 2 x 32-bit adds per clock

and

2 x 16-bit mults

- RAMDSP™ : 2 x 32-bit adds per clock

4 x 16-bit adds or

8 x 8-bit adds

and

2 x 16-bit mults

4 x 8-bit mults

4 x 32-bit adds  
8 x 32-bit adds  
16 x 8-bit adds

- And RAMDSP™ can support 2 - 4x the number of data paths per chip based on it's lower power and die size requirements

# Stellar Architecture

## Compiler Support

The developer of **RAMDSP™**  
*Integrating Ram & Digital Signal Processing*

- In addition a compiler for **RAMDSP™** technology can be developed and be available in 1998.

- The **RAMDSP™** compiler will provide means for posting C code developed for other DSPs and Processors to **RAMDSP™**, and for quickly developing new and custom applications for **RAMDSP™**

# Stellar Architecture

## Additional Important Benefits

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*

- Elimination of complex IOP (I/O Processor)
- Advanced techniques for reconfiguring and sharing DRAM and SRAM memory
- Memory management of DRAM and SRAM results in seamless execution of complex DSP applications
  - delays due to DMA or I/O are overlapped with computation
- Minimization of on-chip silicon for addressing and control of DRAM, while maintaining high-bandwidth communications to RAM

# Stellar Architecture

## Additional Important Benefits

The developer of RAMDSP™  
*Integrating Ram & Digital/Signal Processing*

- Minimal size of memory structure (cell area), while supporting high-throughput execution
- Requirements for moving data between multiple engines and memory blocks substantially reduced
- Low power designs inherent in **RAMDSP™** architecture
- Reconfiguration capabilities for single and multiple engine applications

# Stellar Products

## DRAM Product Focus cont...

The developer of RAMDSP™

*Integrating Ram & Digital Signal Processing*

### Specialized Memory Devices

- Currently, additional patent applications are being initiated to create a family of specialized memory devices
- These have substantial improvements in cost and performance using **RAMDSP™** technology
- Stellar will disclose this technology to Micron as soon as applications are filed (approximately 1 month)



# Stellar Architecture

## Execution Unit Detail

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*



# Stellar Architecture



## RAMDSP applied to Video Applications

The developer of **RAMDSP™**

*Integrating Ram & Digital Signal Processing*

- Low power solution to upcoming generation of digital-video consumer products

- Professional Broadcasting

- PCs and Workstations

- DVD-based Video Recorders and Camcorders

- Key enablers to new technologies

- Design Flexibility

- Programmability

- Integrated Memory

# Stellar Architecture

## RAMDSP Design Flexibility Targets MPEG-2

The developer of RAMDSP™

*Integrating Ram & Digital Signal Processing*

- EU design can be "bulked up" when applied to simple operations such as those used in motion estimation.
- Motion Estimation can require up to 8 Gops of performance or more.
- EU performance scales upward and is only limited by DRAM bandwidth.

# Stellar Architecture

## RAMDSP Design Flexibility Targets MPEG-2

The developer of RAMDSP™  
Integrating Ram & Digital Signal Processing

### MPEG-2 Encoder Motion Estimation

Baseline Configuration

EU: 64-bit data path

MCCU: ~~256~~<sup>16K</sup> bits SRAM  
control store

ME Configuration

EU: 128,256-bit data path

MCCU: ~~128~~<sup>16K</sup> bits SRAM  
control store

NOTE:  
RAM  
stores are  
microcode  
and  
than

down to  
down to  
SRAM - 1/2 16K

fill which  
other

half  
executes 1/2

see  
next  
page



# Stellar Architecture

## RAMDSP MPEG-2 Encoder Solution

The developer of RAMDSP™  
Integrating Ram & Digital Signal Processing



cont. from  
Ninth binder prior page!

### Single Chip Solution



53

Codec Unit  
BIU: Bitstream Capable  
EU: 64-bit data path

Motion Estimation Unit  
EU: 128-bit data path

Motion Estimation Unit  
EU: 128-bit data path

# Stellar Architecture

## RAMDSP Programmability Targets MPEG-2

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*

### MPEG-2 Motion Estimation Algorithms

$$EU: \sum_{m=0}^{15} \sum_{n=0}^{15} |X_{m,n} - R_{m+i,n+j}|$$

$$\sum_{m=0}^{15} \sum_{n=0}^{15} |X_{m,n} - R_{m+i,n+j}|$$

$$\sum_{m=0}^{15} \sum_{n=0}^{15} |X_{m,n} - (R_{m+i,n+j} + R_{m+i,n+j+1})|$$

$$2 \sum_{m=0}^{15} \sum_{n=0}^{15} (X_{m,n} - R_{m+i,n+j})^2$$

- MCCU: Three-step search
- Hierarchical search
- Full search

...

# Stellar Architecture



## RAMDSP Programmability Targets MPEG-2

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*

|     |     |     |     |     |     |     |     |     |
|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| x00 | x01 | x02 | x03 | x04 | x05 | x06 | x07 | ... |
|-----|-----|-----|-----|-----|-----|-----|-----|-----|

$$t_{ij} = \text{abs}(x_{ij} - f_{ij})$$

|     |     |     |     |     |     |     |     |     |
|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| r00 | r01 | r02 | r03 | r04 | r05 | r06 | r07 | ... |
|-----|-----|-----|-----|-----|-----|-----|-----|-----|

(8) 8-bit ops

|     |     |     |     |     |     |     |     |     |
|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| t00 | t01 | t02 | t03 | t04 | t05 | t06 | t07 | ... |
|-----|-----|-----|-----|-----|-----|-----|-----|-----|

$$a_{ij} = a_{ij} + t_{ij}$$

(4) 16-bit ops

|     |     |     |     |     |     |     |     |     |
|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| t00 | t01 | t02 | t03 | t04 | t05 | t06 | t07 | ... |
|-----|-----|-----|-----|-----|-----|-----|-----|-----|

$$a_{ij} = a_{ij} + t_{ij}$$

(4) 16-bit ops

|     |     |     |     |     |
|-----|-----|-----|-----|-----|
| a01 | a03 | a05 | a07 | ... |
|-----|-----|-----|-----|-----|

# Stellar Architecture



## RAMDSP Integrated Memory Targets MPEG-2

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*

### Motion Estimation Buffer Requirements

**RAMDSP™** features multiple Mbytes of memory resident near the execution unit, which allows quick access to predictor frames.



# Stellar Architecture

## RAMDSP Performance on 8x8 DCT

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*

Depending on the data type, EU can operate on 4 or 2 columns at a time, per 64-bit data path





# Stellar Architecture

## RAMDSP Performance on 8x8 DCT

The developer of RAMDSP™

*Integrating Ram & Digital Signal Processing*

Fast transpose operation keeps SIMD style column operations running at near-peak performance

|     |     |     |     |
|-----|-----|-----|-----|
| x00 | x01 | x02 | x03 |
| x10 | x11 | x12 | x13 |
| x20 | x21 | x22 | x23 |
| x30 | x31 | x32 | x33 |



# Stellar Architecture

## RAMDSP Performance on 8x8 DCT

The developer of **RAMDSP™**  
*Integrating Ram & Digital Signal Processing*

- A single 8x8 DCT operation takes approximately 200 clock cycles on a 64-bit EU datapath.
- Assuming an 8:1 ratio between EU cycle times and DRAM cycle times, and a 512-bit wide data bus:
  - DRAM cycles to load data from memory: 2
  - DRAM cycles to compute the 8x8 DCT: 25
  - DRAM cycles to store data to memory: 2

MCCU:



EU:



# Stellar Architecture

## RAMDSP Bus Interface Unit for MPEG-2

The developer of **RAMDSP™**  
*Integrating Ram & Digital Signal Processing*

- Bus Interface Unit from baseline configuration is enhanced to facilitate bitstream coding and decoding.
- Fast SRAM added for quick coefficient lookup, and access to quantizer matrices.
- Additional shift operations, leading one operations.
- Caching of data to aid in zigzag ordering before storing to DRAM, or writing to bus.

# Stellar Architecture

**Power Profile**  
The developer of **RAMDSP™**  
*Integrating Ram & Digital Signal Processing*

## Power Profile



# Stellar Architecture

## RAMDSP vs. Other MPEG-2 Encoder Solutions

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*



|                      | CHROMATIC<br>MPACT 2             | C-CUBE<br>DVX          | TRIMEDIA<br>TM-1000    | RAMDSP<br>3+1    |
|----------------------|----------------------------------|------------------------|------------------------|------------------|
| Architecture         | VLIW<br>ME CoProc                | RISC Core<br>ME CoProc | VLIW<br>Decode<br>Core | SIMD -<br>Vector |
| Transistors          | 3.0 M                            | 5.5 M                  | 5.5 M                  | 2.0 M            |
| Peak Performance     | 6 Bops                           | 1 Bop + ME             | 3.8 Bops               | 11 Bops          |
| Memory Configuration | 4 - 8 MB<br>RAMBUS               | 8 MB<br>SDRAM          | SDRAM                  | 8 MB<br>embedded |
| Notes                | Toshiba may<br>embed 4MB<br>DRAM |                        | MPEG-2<br>decode only  |                  |

# Stellar Architecture

## RAMDSP Device Count Estimation



*The developer of **RAMDSP™**  
Integrating Ram & Digital Signal Processing*

### Execution Unit plus Data Path

Register File 160k

Data Path ~ 50k

### MicroCode Store

16k SRAM

96k Transistors

### Address Generators and Controllers

SE AG ~ 5k

DMA AG ~ 6.5k

DMEM AG ~ 6.5k

LMEM AG ~ 4.0k

*Note: That if*

*THREE LEVEL*

*REG*

*EXECUTION*

*DATA*

*PATH*

*WOULD IMPROVE*

*CONTINUOUS*

*PERFORMANCE* SO

*THAT ONE MEG*

*ENGINE COULD*

*BE ELIMINATED*

*THUS A REDUCTION OF*

*500,000 TRANSISTORS*

*1M*

*3*

# Stellar Architecture

The developer of RAMDSP™  
*Integrating Ram & Digital Signal Processing*  
Motion Estimation Unit



# Stellar Architecture



The developer of **RAMDSP™**  
*Integrating Ram & Digital Signal Processing*

## BUS Interface Unit for MPEG

- 16k bits storage
- RISC like core

## MPEG Config Total Transistor Count

192,000 (RAM)

40,000 (Regular)

## General Purpose RAMDSP Engine

- Three motion estimation engines
- On special purpose BIU

## • GP RAMDSP Engine

### - Execution Unit and Data Path

96k (RAM)

22k (regular)

192k (RAM)

---

520 k

# Stellar Architecture

The developer of **RAMDSP™**  
*Integrating Ram & Digital Signal Processing*

## Total MPEG Device Count

- Implementation for MPEG Encode/Decode

- GPRAMDSP
- THREE ME Engines
- BIU

520k  
1,156k

132k

FULL CHIP

1,808k + 64Mbit DRAM  
(includes 116k bits SRAM)

## SEQ and AG instructions

| UNIT   |         | op code  | operand 1 | operand 2 |
|--------|---------|----------|-----------|-----------|
| SEQ    | nop     | 00000000 |           |           |
| SEQ    | halt    | 00000001 |           |           |
| SEQ    | return  | 00000010 |           |           |
| SEQ    | jmp     | 00010000 | 4 bits    |           |
| SEQ    | call    | 00010001 | 4 bits    |           |
| SEQ    | putx    | 00010010 | 4 bits    |           |
| SEQ    | getx    | 00010011 | 4 bits    |           |
| SEQ    | putxhi  | 00010100 | 4 bits    |           |
| SEQ    | getxhi  | 00010101 | 4 bits    |           |
| SEQ    | getstat | 00010110 | 4 bits    |           |
| SEQ    | setsfft | 00010111 | 4 bits    |           |
| SEQ    | setcfg  | 00011000 | 4 bits    |           |
| SEQ    | setloop | 00011001 | 4 bits    |           |
| SEQ    | setcond | 00011010 | 4 bits    |           |
| inc    | s       | 0010000x | 4 bits    |           |
| dec    | d       | 0010001x | 4 bits    |           |
| add    | s       | 0011000x | 4 bits    |           |
| sub    | s       | 0011001x | 4 bits    |           |
| neg    | s       | 0011010x | 4 bits    |           |
| mov    | s       | 0011011x | 4 bits    |           |
| and    | s       | 0011100x | 4 bits    |           |
| or     | s       | 0011101x | 4 bits    |           |
| xor    | s       | 0011110x | 4 bits    |           |
| not    | s       | 0011111x | 4 bits    |           |
| shft   | cnt     | 010000   | 6 bits    |           |
| brcond | offset  | 0110     | 12 bits   |           |
| SEQ    | loop    | 0111     | 12 bits   |           |
| AG     | setavec | 0110     | 12 bits   |           |
| AG     | setmvec | 0111     | 12 bits   |           |
| AG     | ldi     | 10       | 10 bits   | 4 bits    |
| AG     | ldiu    | 11       | 10 bits   | 4 bits    |

Note: If x = 1, then set the DRAM address register with the ALU output.

## EU Adder instructions

|         |     |         |            |         |         |        |        |
|---------|-----|---------|------------|---------|---------|--------|--------|
| anop    |     |         |            | 0000000 | 0001111 | uhz    | 3 bits |
| pacc    | u/s | h/l     | 8/16       | 001uszz | 010uszz | 3 bits | 3 bits |
| padd    | u/s | sat/ton | 8/16/32    | s1      | s2      | 3 bits | 3 bits |
| psub    | u/s | sat/ton | 8/16/32    | s1      | s2      | 3 bits | 3 bits |
| pabdif  | u/s | sat/ton | 8/16/32    | s1      | s2      | 3 bits | 3 bits |
| parss   | u/s | sat/ton | 8/16/32    | s1      | s2      | 3 bits | 3 bits |
| pcomp   | u/s | sat/ton | 8/16/32    | s1      | s2      | 3 bits | 3 bits |
| psht    | u/s | sat/ton | 8/16/32    | s1      | s2      | 3 bits | 3 bits |
| pldzero |     |         | 8/16/32/64 | s1      | s2      | 3 bits | 3 bits |
| pldone  |     |         | 8/16/32/64 | s1      | s2      | 3 bits | 3 bits |
| csht    |     |         | 110uszz    | 110uszz | 3 bits  | 3 bits | 3 bits |
| dshift  |     |         | 11100zz    | 11100zz | 3 bits  | 3 bits | 3 bits |
| and     |     |         | 11101zz    | 11101zz | 3 bits  | 3 bits | 3 bits |
| or      |     |         | 1111000    | 1111000 | 3 bits  | 3 bits | 3 bits |
| xor     |     |         | 1111001    | 1111001 | 3 bits  | 3 bits | 3 bits |
| not     |     |         | 1111100    | 1111100 | 3 bits  | 3 bits | 3 bits |
|         |     |         | 1111101    | 1111101 | 3 bits  | 3 bits | 3 bits |

## EU Multiplier instructions

|         |     |            |         |         |         |        |        |
|---------|-----|------------|---------|---------|---------|--------|--------|
| mnop    |     |            |         | 0000000 | 000100s | uhz    | 3 bits |
| ppack   | u/s | h/l        | 8/16    | 0001010 | 0001010 | uhz    | 3 bits |
| punkp   | u/s | h/l        | 8/16    | 0001111 | 0001111 | uhz    | 3 bits |
| pacc2   | u/s | sat/ton    | 8/16/32 | s1      | s2      | 3 bits | 3 bits |
| padd2   | u/s | sat/ton    | 8/16/32 | s1      | s2      | 3 bits | 3 bits |
| psub2   | u/s | sat/ton    | 8/16/32 | s1      | s2      | 3 bits | 3 bits |
| pabdif2 | u/s | sat/ton    | 8/16/32 | s1      | s2      | 3 bits | 3 bits |
| parss2  | u/s | sat/ton    | 8/16/32 | s1      | s2      | 3 bits | 3 bits |
| pmul    | u/s | hh/h/h/h   | 8/16    | 010uszz | 010uszz | 3 bits | 3 bits |
| pmin    | u/s | 8/16/32/64 | s1      | s2      | d       | 3 bits | 3 bits |
| pmax    | u/s | 8/16/32/64 | s1      | s2      | d       | 3 bits | 3 bits |
| cndmrg  |     | 8/16/32    | s1      | s2      | d       | 3 bits | 3 bits |
| cndmov  |     | 8/16/32    | s1      | s2      | d       | 3 bits | 3 bits |

# Stellar Architecture



The developer of **RAMDSP™**  
*Integrating Ram & Digital Signal Processing*

## RAMDSP FIFO Technology

- Input/Output data words can run synchronous or asynchronous at 200 MHz.
- Discontinuous streams on input or outputs are directly dealt with by refresh mechanisms.
- Refresh mechanisms are hidden from the FIFO interface and managed internally.
- Virtually all of the virtual FIFO storage is implemented in the DRAM with minimal logic overhead associated with the input and output stream.
- All FIFO wrap-around conditions are allocated and special conditions flags can be allocated as required.

# Stellar Architecture



The developer of **RAMDSP™**  
*Integrating Ram & Digital Signal Processing*  
RAMDSP FIFO Technology

- Deep FIFO applications in the communications market stack multiple FIFO standard products from companies such as IDT and Cypress. Stellar can solve this problem with a single chip solution.
- Stellar is interested in developing a Silicon Compiler to develop both standard and customized applications for Micron's customers.
- Stellar is interested in \$1.0 M to develop a compiler and porting Micron's design rules to the compiler.
- Payment is based upon pre-paid one time license fee and pre-paid royalties.

# Stellar Architecture



## The developer of RAMDSP™ *Integrating Ram & Digital Signal Processing* RAMDSP Application to ATM Products

- ATM Technologies being developed at Stellar utilize techniques substantially more advanced than input Queing or shared memory configurations.
- Stellar's focus in ATM is to provide high performance low cost switches based upon its FIFO oriented technology resulting in performance achieved by fine grained output queing technologies and/or buffered fabric designs.
- Stellar clearly believes that it is always better to have as large a memory as possible (particularly if it is designed for best throughput) if this results in better link utilization.

# Stellar Architecture



The developer of **RAMDSP™**  
*Integrating Ram & Digital Signal Processing*  
RAMDSP Application to ATM Products

- Stellar switching technology will also address multicasting requirements for Internet and ATM network.
- In addition tight coupling between Stellar's switch technology and RAMDSP technology will support the emergence of non-blocking switching backbone coupled with tight coupling to the signal process environment, e.g. - coupling to Stellar MPEG technologies, and other RAMDSP Signal Processing.